Locating and Augmenting Object Features in Images

ABSTRACT

A computer-implemented method and system are described for augmenting image data of an object in an image, the method comprising receiving captured image data defining a respective plurality of augmentation values to be applied to the captured image data, storing a plurality of augmentation representations, each representation identifying a respective portion of augmentation image data, selecting one of said augmentation image data and one of said augmentation representations based on at least one colourisation parameter, determining a portion of the augmentation image data to be applied based on the selected augmentation representation, augmenting the captured image data by applying said determined portion of the augmentation image data to the corresponding portion of the captured image data, and outputting the augmented captured image data.

FIELD OF THE INVENTION

This invention relates to an image processing system, and moreparticularly to techniques for locating and augmenting object featuresin images.

BACKGROUND OF THE INVENTION

Choosing a new cosmetic product is often a tedious and time consumingprocess, and is only usually possible in a retail environment wheresamples are made available. An important consideration for a customertrying on any new product is seeing how it looks as they move around,taking momentary opportunity to view themselves wearing the cosmeticfrom particular angles or with particular expressions.

Utilising the mass availability of handheld, or other, computing devicesto make real-time virtual try-on of new cosmetics possible in anyenvironment has the potential to radically change the way the customerfinds the perfect product. Three main challenges for any such system arefirst, locating and tracking the features of a subject in a livecaptured image data stream, second, augmenting a virtual cosmeticproduct realistically in place over the live images, and finally to doall this in real-time, particularly on devices having limited hardwarecapabilities.

Feature tracking systems are generally known, in which tracking of anidentified person or object in a captured scene is performed based onestablished image processing techniques. For example, one well knowntechnique for object shape modelling is Active Shape Modelling (ASM), asdiscussed for example in “Lip Reading Using Shape, Shading And Scale”,Mathews, Cootes, Cox, Harvey and Bangham, and “An Introduction To ActiveShape Models”, Tim Cootes. Another well known technique for objectappearance modeling is Active Appearance Modelling (AAM), as discussedfor example, in “Active Appearance Models”, Cootes, Edwards and Taylor.

However, conventional feature tracking systems are not efficient andtypically require significant computational overheads, for example foroff-line training of object models and subsequent live tracking ofobjects based on the trained models. Moreover, techniques such as AAMperform well on conventional per-person models but are slow, not robustenough and unable to generalize for data not included in the trainingset.

What is desired are improved techniques for feature tracking andaugmenting that address these challenges.

STATEMENTS OF THE INVENTION

Aspects of the present invention are set out in the accompanying claims.

In one aspect, the present invention provides a computer-implementedmethod of augmenting image data, the method comprising receiving data ofan image captured by a camera; receiving data identifying coordinates ofa plurality of labelled feature points defining a detected object in thecaptured image; storing at least one augmentation image data defining aplurality of augmentation values to be applied to the captured imagedata; storing at least one augmentation representation including datadefining at least one polygonal region of the augmentation image data,the or each polygonal region defined by three or more of vertices, eachvertex associated with a corresponding labelled feature point;determining a transformation of the at least one polygonal region of therepresentation based on the received coordinates of the correspondingfeature points; applying the determined transformation to correspondingregions of the augmentation image data defined by the at least onepolygonal regions of the augmentation representation; and augmenting thecaptured image data by applying the transformed at least one portion ofthe augmentation image data to the corresponding portion of the capturedimage data.

The augmentation image data may comprise one or more of texture imagedata, data identifying one or more material properties, and amathematical model to generate an array of augmentation values. Theaugmentation image data and the captured image data may have the samedimensions or different dimensions.

Coordinates of the vertices of the augmentation representation may bedefined relative to pixel locations of the augmentation image data.Augmenting the captured image data may further comprise applying atleast one image data adjustment to the transformed at least one portionof the augmentation image data. The at least one image data adjustmentmay comprise one or more of a highlight adjustment, a colour adjustment,a glitter adjustment, a lighting model adjustment, a blend colouradjustment, and an alpha blend adjustment.

Mask data may be stored defining at least one masked portion of theaugmentation image data. The determined transformation may be applied tocorresponding regions of the mask data defined by the at least onepolygonal regions of the augmentation representation. The at least oneimage data adjustment may comprise alpha blending said transformedmasked data regions.

A stored augmentation image data and a stored augmentationrepresentation may be selected based on at least one colourisationparameter. A plurality of stored augmentation image data and storedaugmentation representations may be selected based on respective atleast one colourisation parameters.

Augmenting the captured image data may comprise alpha blending theresults of applying, for each selected augmentation representation insequence, transformed at least one portions of each selectedaugmentation image data to the corresponding portion of the capturedimage data, and applying the alpha blended output to the captured imagedata.

According to another aspect, the present invention provides acomputer-implemented method of generating the above augmentationrepresentations, based on respective predefined mask data identifyingcoordinates of a plurality of masked pixels.

An augmentation representation may be generated by retrieving datadefining a plurality of polygonal regions determined for the detectedobject, placing the retrieved plurality of polygonal regions over therespective mask data, identifying polygonal regions that include atleast one masked pixel, and storing data representing the identifiedsubset of polygonal regions. Each masked pixel may comprise a valuerepresenting a blend parameter.

The detected object in the captured image may be located by storing arepresentation of the object, the representation including data defininga first object model and a corresponding function that approximatesvariations to the first object model, and data defining at least onesecond object model comprising a subset of the data defining the firstobject model, and at least one corresponding function that approximatesvariations to the respective second object model; determining anapproximate location of the object in the captured image, based on thefirst object model and its corresponding function; and refining thelocation of the object in the captured image by determining a locationof a portion of the object, based on the at least one second objectmodel and its corresponding function.

The first object model may comprise data representing locations of aplurality of feature points and the second object model comprises asubset of the feature points of the first object model. The first objectmodel may define a shape of the whole object and the at least one secondobject model may define a shape of a portion of the object.

Determining the approximate location of the object in the image maycomprise generating a candidate object shape based on the first objectmodel and applying the corresponding function to determine anapproximate location of the candidate object shape.

The candidate object may be split into one or more candidate objectsub-shapes based on the at least one second object models. The locationof the one or more candidate object sub-shapes may be refined based onthe respective second object model and its corresponding function.

The corresponding functions may comprise regression coefficientmatrices. The corresponding function that approximates variations to thesecond object model may comprise a plurality of cascading regressioncoefficient matrices. The location of the one or more candidate objectsub-shapes may be iteratively refined based on the respective secondobject model and its corresponding plurality of cascading regressioncoefficient matrices.

According to yet another aspect, the present invention provides acomputer-implemented method of augmenting image data, the methodcomprising receiving captured image data from a camera; storing aplurality of augmentation image data defining a respective plurality ofaugmentation values to be applied to the captured image data; storing aplurality of augmentation representations, each representationidentifying a respective portion of augmentation image data; selectingone of said augmentation image data and one of said augmentationrepresentations based on at least one colourisation parameter;determining a portion of the augmentation image data to be applied basedon the selected augmentation representation; augmenting the capturedimage data by applying said determined portion of the augmentation imagedata to the corresponding portion of the captured image data; andoutputting the augmented captured image data.

In one exemplary aspect, there is provided a computer-implemented methodof locating an object in an image, the method comprising storing arepresentation of the object, the representation including data defininga first object model and a corresponding function that approximatesvariations to the first object model, and data defining at least onesecond object model comprising a subset of the data defining the firstobject model, and a corresponding function that approximates variationsto the second object model; determining an approximate location of theobject in the image, based on the first object model and itscorresponding function; and refining the location of the object in theimage by determining a location of a portion of the object, based on thesecond object model and its corresponding function.

The first object model may comprises data representing locations of aplurality of feature points and the second object model may comprise asubset of the feature points of the first object model. A region of animage that contains the object may be identified, wherein theapproximate location of the object is determined within the identifiedregion of the image.

The first object model may define a shape of the whole object and the atleast one second object model may define a shape of a portion of theobject. The approximate location of the object in the image may bedetermined by generating a candidate object shape based on the firstobject model and applying the corresponding function to determine anapproximate location of the candidate object shape.

The candidate object may be split into one or more candidate objectsub-shapes based on the at least one second object models. The locationof the one or more candidate object sub-shapes may be refined based onthe respective second object model and its corresponding function. Therepresentation of the object may further comprise data defining computedmean and standard deviation statistics associated with position andscale of the first object model. The image may be transformed based onthe computed statistics.

The location of a candidate object sub-shape may be refined bydetermining an object feature descriptor for the candidate objectsub-shape based on the transformed image. The corresponding functionsmay comprise regression coefficient matrices. Exponential smoothing maybe applied to the refined location of the object based on a priorlocation of the object determined from a previous image.

In another exemplary aspect, there is provided a computer-implementedmethod of generating the representation of an object based on aplurality of training images with corresponding data defining thelocation of the object therein, the representation including datadefining a first object model and a corresponding function thatapproximates variations to the first object model, and data defining atleast one second object model comprising a subset of the data definingthe first object model, and a corresponding function that approximatesvariations to the second object model.

In a further exemplary aspect, there is provided a computer-implementedmethod of augmenting image data, the method comprising receiving data ofan image captured by a camera, the captured image including a facialfeature portion corresponding to at least one feature of a user's face;processing the captured image data to determine a subset of pixels ofthe received image data associated with the detected facial featureportion; calculating replacement pixel data values for the determinedsubset of pixels of the received image data, based on at least oneselected augmentation parameter; and outputting data of an augmentedversion of the captured image, including the calculated replacementpixel data for the determined subset of pixels.

User input may be received identifying at least one selected facialfeature. The selected facial feature may be the user's lips. Arepresentation of the lips may be stored, the representation includingdata defining a model of the lips and a corresponding function thatapproximates variations to the model, and an approximate location of thelips may be determined in the captured image based on the model and itscorresponding function. The location of the lips in the captured imagemay be iteratively refined.

The approximate location of the lips may be defined by coordinates of aplurality of labelled feature points, and the augmentation process mayfurther include: storing augmentation image data defining augmentationvalues to be applied to a portion of the captured image data;determining a transformation of the augmentation image data based on thedetermined coordinates of the corresponding feature points; andaugmenting the captured image data by applying the transformedaugmentation image data to the corresponding portion of the capturedimage data.

In yet another exemplary aspect, there is provided an image processingmethod comprising detecting an area of an input image corresponding to auser's face; detecting an area of the input image corresponding to oneor more facial features of a user; populating a foreground colourhistogram with the frequency of occurrence of colour values in thedetected facial feature area and populating a background colourhistogram with the frequency of occurrence of colour values within thedetected face area but outside of the detected facial feature area;generating a probability map based on a determination, from the colourvalues of pixels within the input image, of likelihood valuesrepresenting the likelihood of pixels belonging to an image area ofinterest, the likelihood value for each colour value being determinedfrom a combination of the foreground and background histograms; mappingforeground pixels having a likelihood value above a predeterminedthreshold to positions with a colour space, and determining a foregroundcolour cluster centre within the colour space for the mapped foregroundpixels; mapping background pixels having a likelihood value below thepredetermined threshold to positions within the colour space, anddetermining a background colour cluster centre for the mapped backgroundpixels; reallocating mapped pixels between the foreground and backgroundcolour clusters based on relative proximity to the foreground andbackground colour cluster centres; updating the foreground andbackground histograms using the reallocated pixels; generating anupdated probability map for an input image to be augmented based on adetermination, from the colour values of pixels within the input imageto be augmented, of likelihood values representing the likelihood ofpixels belonging to an image area of interest, the likelihood value foreach colour value being determined from a combination of the updatedforeground and background histograms; and modifying the input image tobe augmented to change the appearance of the user's facial features.

In a further exemplary aspect, there is provided an image processingmethod, comprising the steps of identifying an area of an input imagecorresponding to one or more facial features of a user; and generatingreplacement colour values for pixels within the identified area of theimage based on a combination of a highlight adjustment process and aglitter adjustment process.

In further aspects, the present invention provides a system comprisingmeans for performing the above methods. In yet other aspects, there isprovided a computer program arranged to carry out the above methods whenexecuted by a programmable device.

BRIEF DESCRIPTION OF THE DRAWINGS

There now follows, by way of example only, a detailed description ofembodiments of the present invention, with references to the figuresidentified below.

FIG. 1 is a block diagram showing the main components of a featuretracking and colourisation system according to an embodiment of theinvention.

FIG. 2 is a block diagram showing the main components of the texturemodel training module and colourisation module shown in FIG. 1 and thecomponents of a trained texture model according to an embodiment of theinvention.

FIGS. 3a, 3b, 3c, and 3d schematically illustrate examples of dataprocessed and generated by the texture model training module during thetraining process.

FIG. 4 is a block diagram showing the main components of the shape modeltraining module shown in FIG. 1 and the components of a trained shapemodel according to an embodiment of the invention.

FIG. 5 is a schematic illustration of an exemplary trained modelincluding a global shape and a plurality of sub-shapes.

FIG. 6 is a flow diagram illustrating the main processing stepsperformed by the texture model training module of FIGS. 1 and 2according to an embodiment.

FIGS. 7a, 7b, 7c, and 7d schematically illustrate further examples ofdata processed and generated by the texture model training module duringthe training process.

FIG. 8 is a flow diagram illustrating the main processing stepsperformed by the shape model training module of FIGS. 1 and 2 accordingto an embodiment.

FIG. 9 shows an example of user-defined shapes defined a plurality oflabelled feature points, displayed over a training image.

FIGS. 10a, 10b, and 10c schematically illustrate examples of global andsub-shaped models generated by the training module according to anembodiment.

FIG. 11 is a flow diagram illustrating the processing steps performed bythe shape model training module to compute statistics based on theobject detector output and user-defined shape, according to anembodiment.

FIGS. 12a, 12b, 12c, 12d, and 12e show further examples of theprocessing steps performed by the shape model training module of FIG. 2.

FIGS. 13a and 13b show a flow diagram illustrating the main processingsteps performed by the shape model training module of FIG. 4 todetermine cascading regression coefficient matrices according to anembodiment of the invention.

FIG. 14 is a flow diagram illustrating the sub-processing stepsperformed by the training module to determine offset values and featurepoint descriptors based on a selected training image.

FIG. 15 is a flow diagram illustrating the main processing stepsperformed by the system of FIG. 1 to track and augment objects in acaptured image according to an embodiment.

FIG. 16 is a flow diagram illustrating the processing steps of aninitialization process performed by the tracking module.

FIG. 17 is a flow diagram illustrating the processing steps performed bythe tracking module to refine an object shape according to anembodiment.

FIGS. 18a, 18b, 18c, 18d, and 18e show an exemplary sequence of displayscreens during the tracking process of FIG. 15.

FIG. 19 is a flow diagram illustrating the main processing stepsperformed by the colourisation module of FIGS. 1 and 2 to applycolourisation to image data according to an embodiment.

FIG. 20 shows examples of data that is processed by, and processingsteps performed by the colourisation module during the colourisationprocess of FIG. 19.

FIG. 21 schematically illustrates an exemplary sequence of data that maybe processed by, and processing steps performed by, the transform moduleto determine transformation of mesh data.

FIGS. 22a, 22b, 22c, and 22d are schematic block flow diagramsillustrating the main components and processing flows for exemplaryshader shader modules in the colourisation module.

FIG. 23 schematically illustrates an example process for generating ablurred version of the captured image data.

FIG. 24 is a diagram of an example of a computer system on which one ormore of the functions of the embodiment may be implemented.

DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION Overview

Specific embodiments of the invention will now be described for aprocess of training object shape and texture models, a process oftracking detected objects based on trained object shape models, and aprocess of augmenting image data of the tracked objects based on trainedobject texture models. It will be appreciated that although therespective processes and associated processing modules are described asseparate embodiments, aspects of the described embodiments can becombined to form further embodiments. For example, alternativeembodiments may comprise one or more of the texture training, shapetraining, object tracking, and object colourisation and augmentationaspects described in the embodiments below.

Referring to FIG. 1, a tracking and augmenting system 1 according to anembodiment comprises a texture model training module 4 for processing arepresentative image in an image database 5 to generate and storetrained texture models 30, as will be described in detail below. A shapemodel training module 3 may also be provided for processing trainingimages in the image database 5 to generate and store trained shapemodels for use during real-time processing of input image data from acamera 9 by a tracking module 11. The processing of image data by theshape model training module 3 and texture model training module 4 may bereferred to as “offline” pre-processing, as the training processes aretypically carried out in advance of the “real-time” image processing bythe tracking module 11. The tracking module 11 receives image datacaptured by the camera 9 and determines the location of an identifiedobject in the image, defined by a plurality of labelled feature points.Image data of the tracked object is then augmented by a colourisationmodule 13, based on colourisation parameters 14, and output to a display15.

The system 1 may be implemented by any suitable computing device of atype that is known per se, such as a desktop computer, laptop computer,a tablet computer, a smartphone such as an iOS™, Blackberry™ or Android™based smartphone, a ‘feature’ phone, a personal digital assistant (PDA),or any processor-powered device with suitable user input, camera anddisplay means. Additionally or alternatively, the display 15 can includean external computing device, such as a mobile phone, tablet PC, laptop,etc. in communication with a host device for example via a data network(not shown), for example a terrestrial cellular network such as a 2G, 3Gor 4G network, a private or public wireless network such as aWiFi™-based network and/or a mobile satellite network or the Internet.

Texture Model Training Module

The texture model training module 4 in the tracking and augmentingsystem 1 of the present embodiment will now be described in more detailwith reference to FIG. 2, which shows the main elements of the texturemodel module 4 as well as the data elements that are processed andgenerated by the texture model module 4 for the trained texture models30. Reference is also made to FIGS. 3a to 3d schematically illustratingexamples of data that are processed and generated by the texture modeltraining module 4 during the training process.

As shown in FIG. 2, the texture model training module 4 may include amesh generator 6 that retrieves at least one reference image 8 from thetraining image database 5, for example as shown in FIG. 3a , andgenerates data defining a plurality of polygonal regions based on theretrieved reference image 8, collectively referred to as a normalisedmesh 10. Each region is defined by at least three labelled featurepoints and represents a polygonal face of the two-dimensional normalisedmesh 10. It is appreciated that the normalised mesh may instead definethree-dimensional polygonal regions. The texture model training module 4uses the same set of labelled feature points as the tracking module 11,so that vertex and texture coordinate data can be shared across a commonreference plane. In one example, the mesh generator 6 is configured toreceive data defining the location of labelled feature points in the oreach reference image 8 as determined by the tracking module 11. Inanother example, the mesh generator 6 may prompt a user to input thelocation of each feature point for the or each reference image 8. FIG.3b schematically illustrates a plurality of defined feature pointsoverlaid on a representation of a reference image 8. Preferably, thereference image is a symmetrical reference face, in order to optimizetexture space across all areas of the face where virtual makeup may beapplied.

The texture model training module 4 may be configured to subsequentlyperform triangulation to generate a mesh of triangular regions based onthe labelled feature points. Various triangulation techniques are known,such as Delaunay triangulation, and need not be described further. FIG.3c schematically illustrates an example of a resulting normalised mesh10 a generated from the reference image shown in FIG. 6a and theplurality of labelled feature points shown in FIG. 6b . Optionally, themesh generator 6 may further prompt the user for input to optimize thenormalised mesh 10 a, for example by reducing or increasing the numberof triangles for a particular region of the reference image. FIG. 6dschematically illustrates an example of a resulting optimised version 10b of the normalised mesh 10 a shown in FIG. 6c . Alternatively, the meshgenerator 6 may be configured to facilitate manual triangulation fromthe labelled featured points to generate an optimal normalised mesh 10.It will be appreciated that in the context of the present embodiment, anoptimal normalised mesh 10 consists of triangles that stretch in theiroptimum directions causing the least number of artefacts, resulting in amesh that defines an ideal number of vertices and polygonal faces to beused for the application of virtual makeup as described below.

In the present embodiment, the normalised mesh 10 includes a first dataarray consisting of an indexed listing of the labelled feature pointsdefined by x and y coordinates relative to a common two dimensionalreference plane, such as the pixel locations of the texture image data19, and a second data array consisting of a listing of polygon facesdefined by indices of three or more labelled feature points in the firstdata array. For example, the first data array be an indexed listing of mvertices: [x₀, y₀, x₁, y₁, . . . x_(m), y_(m)], each index correspondingto a different labelled feature point. The second data array may belisting of n exemplary polygon faces: [1/2/20, 1/21/5, . . . ,92/85/86], each polygon face defined by indices or three vertices in thefirst data array. The normalised mesh 10 data can be stored in an objectmodel database 7 of the system 1. It is appreciated that the normalisedmesh 10 may be defined at a different scale from the texture image data19, and an additional processing step can be used to compute thenecessary transformation.

The texture model training module 4 also includes an optimisation module12 that generates a plurality of optimised texture models 14, based onthe normalised mesh 10 retrieved from the object model database 7 anddata defining one or more user-defined masks 16 a, retrieved from theimage database 5 for example. Each texture model 16 generated by theoptimisation module 12 includes data defining the associated mask 14 bsuch as a copy of or pointer to the image data defining the respectiveuser-defined mask 14 a, and an optimised mesh 18 comprising a subset ofthe polygonal regions of the normalised mesh 10 that is determined basedon the associated mask 16 b, as will be described in more detail below.In this way, the optimisation module 12 can be used to take a givenmakeup mask and output only the necessary polygonal faces that are to beused by the colourisation module 13 to render the respective portions ofthe augmented image data.

Many masks can be compounded together to produce a particular desiredvirtual look or appearance, which consists of multiple layers ofvirtually applied makeup, including for example one or more of lipstick,blusher, eye shadow and foundation, in multiple application styles. Themasks 16 may include black and white pixel data. Preferably, the masks16 are grey-scale image data, for example including black pixelsdefining portions of a corresponding texture data file 19 that are notto be included in the colourisation process, white pixels definingportions of the corresponding texture data file 19 that are to beincluded at 100% intensity, and grey pixels defining portions of thecorresponding texture data file 19 that are to be included at anintensity defined by the associated grey value. The white and greypixels are referred to as the masked data regions. In this way,different masks 16 can be provided for various blurring effects.

The texture data 19 may include texture image data, data identifying oneor more associated material properties. Additionally or alternatively,the texture data 19 may include a mathematical model that can be used togenerate an array of augmentation values to be applied by thecolourisation module 13 to the captured image data. The texture imagedata 19 may have the same dimensions as the captured image data receivedfrom the camera. Alternatively, where the texture image data 19 hasdifferent dimensions from the captured image data, such as definingdetails of a portion of the overall face, meta data can be provided toidentify the location of the texture portion relative to the pixellocation of a captured image and/or reference image 8.

Colourisation Module

The colourisation module 13 in the tracking and augmenting system 1 ofthe present embodiment will now be described in more detail, again withreference to FIG. 2 also showing the main elements of the colourisationmodule 13 as well as the data elements that are processed by thecolourisation module 13 to generate augmented image data that is outputto the display 15. In this embodiment, a plurality of texture data files19 are also stored in the object model database 7, defining image dataof respective associated image augmentation that can be applied to thecaptured image data by the colourisation module 13.

As shown, the colourisation module 13 includes a plurality of shadermodules 22 that determine and apply image colourisation to selectedregions of texture data files 19. For example, four custom virtualmakeup shader modules 22 can be implemented by the colourisation module13, each having a respective predefined identifier, and used todetermine and apply image colourisation to represent virtual applicationof lipstick, blusher, eye shadow and foundation to the captured imagedata. The output of a custom makeup shader module 22 is sent to arenderer 24 that augments the underlying user's face in the capturedimage from the camera 9 with the specified virtual makeup. As will bedescribed in more detail below, each shader module 22 can be based onpredefined sets of sub-shader modules to be applied in sequence, forexample based on selected sets of colourisation parameters 26.

As shown in FIG. 2, predefined sets of colourisation parameters 26 canbe stored in a colourisation parameters database 28, each set 26including one or more predefined property values 26-1, predefinedtexture values 26-2 such as respective identifiers of a stored texturemodel 16 and a stored texture data file 19, and a predefined shader type26-3 such as an identifier of a shader module 22 implemented by thecolourisation module 13. The colourisation parameters database 24 may bea database of beauty product details, for example, whereby each productor group of products is associated with a respective set ofcolourisation parameters 26. Alternatively, the database 24 may includecolourisation parameters 26 derived from product details retrieved fromsuch a product database.

The colourisation module 13 also includes a transform module 20 thatreceives data defining the location of labelled features points in thecommon reference plane, determined by the tracking module 11 for acaptured image. The determined coordinates from the camera image datadefine the positions of the polygonal regions of the normalised mesh 10that match the detected object, the user's face in this embodiment. Thetransform module 20 determines a mapping from the vertices of a selectedregion of an optimised mesh 18 to vertices of the corresponding trackedlabelled feature points. The transform module 20 uses the determinedmapping to transform the associated regions of mask data 14 b andtexture data 19 retrieved from the object model database 7 for theparticular set of colourisation parameters 26, into respective “warped”versions that are processed by the shader modules 22.

After all of the regions and colourisation parameters are processed bythe transform module 20 and defined shader modules 22, the renderer 24overlays the selected optimised meshes 18 according to the commonreference plane, and in conjunction with an alpha blended shadersub-module (not shown), performs an alpha blend of the respective layersof associated regions of warped texture data. The blended result is anoptimized view of what will get augmented on the user's face. The finalresult is obtained by the renderer 24 applying the blended result backonto the user's face represented by the captured image data from thecamera 9, and output to the display 15.

In this way, the colourisation module 13 uses the image data coordinatesfrom the reference face, referenced by the optimised meshes 18, astexture coordinates to the texture data files 19, for each texture model16 associated with a respective set of colourisation parameters 26 for aselected virtual makeup product, transformed according to the trackedfeature point locations, and rendered over the captured image data,resulting in the visual effect of morphing all of the selected virtualmakeup products to the user's face in a real-time augmented realitydisplay. It will be appreciated that the transform module 20, shadermodules 22 and renderer 24 will include calls to a set of predefinedfunctions provided by a Graphics Processing Unit (GPU) of the system 1.

Advantageously, the present embodiment provides for more efficient GPUusage, as only the portions of the respective texture data files andcaptured image data are transmitted to the GPU for processing.

Shape Model Training Module

An exemplary shape model training module 3 in the tracking andaugmenting system 1 will now be described in more detail with referenceto FIG. 4, which shows the main elements of the shape training module 3as well as the data elements processed and generated by the shapetraining module 3 for the trained shape models 31. As shown, the shapemodel training module 3 includes a shape model module 21 that retrievestraining images 23 and corresponding user-defined feature points 25 fromthe training image database 5. The shape model module 21 generates andstores a global shape model 27 and a plurality of sub-shape models 29for a trained object model 31 in the object model database 7, as will bedescribed in more detail below. It will be appreciated that a pluralityof trained object models may be generated and stored in the object modeldatabase 5, for example associated with respective different types ofobjects.

FIG. 5 is a schematic illustration of an exemplary trained shape modelincluding a global shape 27 and a plurality of sub-shapes 29. As shown,the exemplary data structure of shape model is an array of (x,y)coordinates, each coordinate associated with a respective feature pointof the global shape 27, corresponding to respective labelled featurepoint 25 in the training data. Each sub-shape models 29 may beassociated with a respective subset of the (x,y) coordinates, eachsubset thereby defining a plurality of feature points 25 of therespective sub-shape. The subsets of feature points 25 for eachsub-shape may overlap

In this exemplary embodiment, the image database 5 stores a plurality oftraining images 23, each training image 23 comprising the entire face ofa subject, including one or more facial features such as a mouth, eye oreyes, eyebrows, nose, chin, etc. For example, the training images 23 mayinclude subject faces and facial features in different orientations andvariations, such as front-on, slightly to one side, closed, pressed,open slightly, open wide, etc.

The shape model training module 3 may include an appearance sub-shapemodule 33 that can be used to generate sub-shape appearance models 35for one or more of the sub-shape models 29, for example based onpre-defined sub-shape detailed textures. The sub-shape detail texturesmay be pre-prepared grey scale textures, for example for the lip, cheekand eyes of a subject face. Different textures may be used to implementdifferent appearance finishes, for example glossy, matt, shiny etc. Theprocess of generating a sub-shape appearance model structure can involvewarping (through piecewise affine transformations) an image representingthe sub-shape detailed texture to the mean shape specified by thecorresponding sub-shape model 29. A combined sub-model module 37 can beprovided to generate a sub-shape combined model 39 from a sub-shapemodel 29 and a corresponding sub-shape appearance model 35.

In this embodiment, the shape model training module 3 also includes astatistics computation module 41 that computes and stores mean andstandard deviation statistics based on the plurality of global shapemodels 27 of the trained models 31 generated by the shape model module21 and the output of the object detector module 42. The computedstatistics can advantageously provide for more robust, accurate andefficient initial positioning of an object that is to be located withinthe bounding box output by an object detector module 42. In the presentexemplary embodiment, object detector module 42 can implement any knownface detector algorithm.

A regression computation module 43 of the shape model training module 3generates a global shape regression coefficient matrix 45 based on theglobal shape 27 generated by the shape model module 21, and at least onesub-shape regression coefficient matrix 47 for each sub-shape 29generated by the shape model module 21. As is known in the art, theregression coefficient matrices 45, 47 define an approximation of atrained function that can be applied, for example during a trackingphase, to bring the features of a candidate object shape from respectiveestimated locations to determined “real” positions in an input image.The generation of regression coefficient matrices 45, 47 in the trainingprocess therefore define respective trained functions which relate thetexture around an estimated shape and the displacement between theirestimated positions and the final position where the shape features aretruly located. The regression computation module 43 can be configured tocompute the respective regression coefficient matrices 45, 47 based onany known regression analysis technique, such as principal componentregression (PCR), linear regression, least squares, etc. The pluralityof regression coefficient matrices 45, 47 form parts of the trainedmodel 31 stored in the object model database 7.

Texture Model Training Process

A brief description has been given above of the components forming partof the texture model training module 4 of one embodiment. A moredetailed description of the operation of these components in thisembodiment will now be given with reference to the flow diagram of FIG.6, for an example computer-implemented training process using thetexture model training module 4. Reference is also made to FIGS. 7a to7c schematically illustrating examples of data that is processed andgenerated by the texture model training module 4 during the trainingprocess.

As shown in FIG. 6, the training process begins at step S6-1 where thetexture model training module 4 retrieves a normalized object mesh 10from the object model database 7. At step S6-3, the model trainingmodule 4 retrieves a first one of the plurality of user-defined masks 14a from the image database 5. FIG. 7a shows an example of a mask 14 adefining a lip region of the reference image 8 shown in FIG. 3a . Atstep S5-5, the model training module 4 overlays the retrieved mask 14 aon the retrieved normalised object mesh 10 to determine a subset ofregions of the normalised mesh 10 that include at least a portion of themasked data regions. FIG. 7b schematically illustrates an example of themasked regions shown in FIG. 7a , overlaid on the normalised mesh 10shown in FIG. 3d . FIG. 7c schematically illustrates the subset of meshregions as determined by the texture model training module 4. At stepS5-7, the determined subset of mesh regions is stored as an optimisedmesh 18 in a texture model 16 for the associated mask 14 b, in theobject model database 7. At step S5-9, the model training module 4determines if there is another mask 14 a in the image database 5 to beprocessed, and if so, processing returns to step S5-3 where the nextmask 14 a is retrieved for processing as described above, until all ofthe user-defined masks 14 a have been processed in this way.

Shape Model Training Process

A brief description has been given above of the components forming partof the shape model training module 3 of an exemplary embodiment. A moredetailed description of the operation of these components will now begiven with reference to the flow diagram of FIG. 8, for an examplecomputer-implemented training process using the shape model trainingmodule 3. Reference is also made to FIG. 9 schematically illustratingexamples of user-defined shapes defined by labelled feature points, andto FIGS. 10a to 10c schematically illustrating examples of trainedglobal and sub-shape models.

As shown in FIG. 8, the training process may begin at step S8-1 wherethe shape model training module 3 processes user input to define aplurality of labelled feature points 25 in the training images 23 of thetraining image database 5. For example, a user interface may be providedto prompt the user to sequentially define a set of feature points 25 fora training image 23, each labelled feature point 25 associated with arespective location in the corresponding training image 23 and having acorresponding unique identifier. FIG. 9 shows an example of a resultinguser-defined shape 25 a displayed over an associated training image 23,as defined by the plurality of labelled feature points 25. The data maybe defined as a set or array of x and y positions in the image, definingrespectively the x-axis and y-axis position in the image of eachuser-defined feature point 25 in the training image 23. The plurality offeature points 25 may be grouped into subsets of feature locations, eachsubset corresponding to respective sub-aspects of the overall object. Inthe present example, the overall object is a subject's face and thesub-aspects may be i) the lips, mouth and chin, and ii) the eyes,eyebrows, nose and face outline.

At step S8-3, the shape model module 21 of the shape model trainingmodule 3 determines a global shape model 27 for the trained face model31, based on the training images 23 and associated feature points 25retrieves from the training image database 5. Any known technique may beused to generate the global shape model 27. For example, in thisembodiment, the shape model module 21 uses the Active Shape Modelling(ASM) technique, as mentioned above. FIG. 10a shows a schematicrepresentation of an example global shape model 27 generated by theshape model module 21 using the ASM technique. In the illustratedexample, the global shape model 27 of a subject's face includes threemodes of variation as determined by the shape model module 21 from thetraining data. Each mode describes deviations from the same mean shape27 a of the global shape model, illustrated in the middle column, thedeviations differing for each respective mode. For example, theillustrated mode zero represents deviations resulting from the subject'sface turning left and right the second mode represents deviations of thelip and mouth in various open and closed positions, while the third moderepresents deviations of the subject's face tilting vertically up anddown.

It will be appreciated that the data structure of the global shapedmodel 27 will depend on the particular shape modelling technique that isimplemented by the shape model module 21. For example, the ASM techniqueprocesses the distribution of user-defined feature locations in theplurality of training images 23 in order to decompose the data into aset of eigenvectors and eigenvalues, and a corresponding set ofparameters/weights between predefined limits, to define a deformableglobal shape model for a subject's face. The precise steps of the ASMtechnique are known per se, and need not be described further.

At step S8-5, the shape model module 21 determines one or more sub-shapemodels 29, again using the same shape modelling technique used togenerate the global shape model 27. In this step, the ASM technique forexample is applied to the respective subsets of feature locations, togenerate respective sub-shape models 29 corresponding to respectivesub-aspects of the overall face. FIG. 10b shows an example of a firstsub-shape model 29-1 corresponding to the lips, mouth and chin of asubject's face. FIG. 10c shows an example of a second sub-shape model29-2 corresponding to the eyes, eyebrows, nose and face outline of asubject's face. It will be appreciated that the number of modes ofvariation for a global and sub-shape model may vary depending on thecomplexity of the associated training data.

Returning to FIG. 8, at step S8-7, the sub-shape appearance module 33determines a sub-shape appearance model 35 for one or more of thesub-shape models 29 generated by the shape model module 21. In thisexample embodiment, an appearance model 35 is generated for the firstsub-shape model 29 corresponding to the lips, mouth and chin of asubject's face. Any known technique for generating an appearance model35 may be used, for example the Active Appearance Model (AAM) technique,as mentioned above. The particular implementation steps of thistechnique are known per se, and need not be described further. Theresult of the AAM technique applied by the sub-shape appearance module33 is a deformable sub-shape appearance model 35 comprising a meannormalised grey level vector, a set of orthogonal modes of variation anda set of grey level parameters.

At step S8-9, the combined sub-model module 37 determines a sub-shapecombined model 39 for each sub-shape appearance model 35, based on thecorresponding sub-shape model generated by the shape model module 21.For example, the shape model derived from the labelled training images23 can be processed to generate a set of shape model parameters, and thesub-shape appearance model 35 may be similarly processed to generatecorresponding appearance model parameters. The shape model parametersand the appearance model parameters can then be combined, with aweighting that measures the unit differences between shape (distances)and appearance (intensities). As with the ASM and AAM techniques, thecombined model can be generated by using principle component analysisand dimensionality reduction, resulting in a deformable combined modelrepresented by a set of eigenvectors, modes of variation and deviationparameters.

At step S8-11, the statistics computation module 41 can be used tocompute a set of statistics to improve the robustness of initialpositioning of a detected face within a bounding box output by theobject detector module 42. This exemplary processing is described inmore detail with reference to FIG. 11. As shown in FIG. 11, at stepS11-1, the statistics computation module 41 selects a first image fromthe training images 23 in the image database 5. The correspondingfeature points 25 of the user-defined shape for the training image 23are also retrieved from the training image database 5. At step S11-3,the selected training image 23 is processed by the object detectormodule 42 to determine a bounding box of a detected subject's face inthe image 23. FIG. 12a shows an example of a detected face in a trainingimage, identified by the bounding box 51.

At step S11-5, the statistics computation module 41 determines if theidentified bounding box 51 contains the majority of feature points 25 ofthe corresponding user-defined shape 25. For example, a threshold of 70%can be used to define a majority for this step. If it is determined thatthe bounding box 51 does not contain the majority of feature points 25,then position and scale statistics are not computed for the particulartraining image 23 and processing skips to step S11-13 where thestatistics computation module 41 checks for another training image toprocess. On the other hand, if it is determined that the bounding box 51contains a majority of the feature points 25, then at step S11-7, therelative position of the user-defined shape, as defined by the featurepoints 25, within the identified bounding box 51 is calculated. At stepS11-9, the statistics computation module 41 calculates the relativescale of the user-defined shape to the means shape 27 a of the globalshape model 27. At step S11-11, the calculated co-ordinates of therelative position and the relative scale are stored for example in thetraining image database 5, for subsequent computations as describedbelow.

At step S11-13, the statistics computation module 41 determines if thereis another training image 23 in the database 5 to be processed, andreturns to step S11-1 to select and process the next image 23, asnecessary. When it is determined that all of the training images 23, ora pre-determined number of training images 23, have been processed bythe statistics computation module 41, at step S11-15, a mean andstandard deviation of the stored relative position and scale for all ofthe processed training images 23 is computed, and stored as computedstatistics 44 for the particular face detector module 42, for example inthe training image database 5.

Returning to FIG. 8, the offline training process proceeds to stepS8-13, where the regression computation module 43 of the shape modeltraining module 3 proceeds to determine regression coefficient matrices45, 47 for the global shape model 27 and the plurality of sub-shapedmodels 29. This process is described in more detail with reference toFIGS. 13 and 14. The regression computation module 43 computes theregression coefficient matrices 45, 47 based on feature pointdescriptors and corresponding offsets that are determined from thetraining images 23 in the database 5. In the present embodiment, thefeature point descriptors are Binary Robust Independent ElementaryFeatures (BRIEF) descriptors, derived from the calculated conversion ofan input global or sub-shape feature points to a selected image, butother feature descriptors can be used instead such as ORB, FREAK, HOG orBRISK.

As is known in the art, regression analysis is a statistical process formodelling and analyzing several variables, by estimating therelationship between a dependent variable and one or more independentvariables. As mentioned above, the regression coefficient matrices 45,47 define trained functions that represent a series of directions andre-scaling factors, such that a matrix can be applied to a candidateshape model to produce a sequence of updates to the shape model thatconverge to an accurately located shape with respect to an input image(e.g. a training image during a training process, or a captured imageduring a tracking process). In this embodiment, the plurality ofsub-shape regression matrices 47 are arranged as a cascading datastructure. Each regression matrix in level i, overcomes situations wherethe previous regression coefficient matrix did not lead to the finalsolution. For example, the first, highest level regression coefficientmatrix approximates a linear function that tries to fit all cases in thedatabase. The second and further lower level regression matrices fitsituations that the first level regression matrix was not able to copewith. This cascading data structure thereby provides a more flexiblefunction with improved generalization across variations in objectshapes. The training process to determine the cascading sub-shaperegression coefficient matrices 47 simulates similar captured imagescenarios which might be captured and processed during the trackingprocedure, utilising stored training data 5 defining the real or actualdisplacement or offset between the estimated and real position of theobject shape feature points that are known for the training images 23 inthe database 5. The texture around an estimated shape is described bythe BRIEF features and the offset between corresponding labelled featurepoints can be measured in pixels coordinates in the reference imageresolution.

As shown in FIG. 13a , at step S13-1, the regression computation module43 selects a first image 23 and corresponding feature points 25 from thetrained image database 5. At step S13-3, the regression computationmodule 43 computes and stores a first set of BRIEF features for theglobal shape 29 and corresponding offsets, based on the selectedtraining image 23. The process carried out by the regression computationmodule 43 to process a selected training image 23 is described withreference to FIG. 14.

At step S14-1, the regression computation module 43 generates apre-defined number of random shape initialisations 53, based on thegenerated global shape model 27. This generation process involves abounding box obtained by the object detector module 42 and the output ofthe statistics computation module 41. A random value is obtained for xand y displacements within the bounding box and scale relation with themean shape 27 a. Random values are extracted from the 68% of valuesdrawn from a normal distribution or within one standard deviation awayfrom the mean. For example, twenty random values may be computed forscale and x and y displacements, based on the computed statistics storedby the statistics computation module 41 at step S8-11 above, in order togenerate a total of twenty different initializations for a singlebounding box. This sub-process can be seen as a Monte Carloinitialization procedure which advantageously reduces over-fitting andprovides a set of regression coefficient matrices that are capable ofmore generalised object representations than determinist methods orsingle initialization estimates, for example. FIG. 12b shows an exampleof various random shape initialisations 53 displayed over the initialglobal shape model 27, for a particular training image 23.

At step S14-3, a reference shape is determined by scaling the mean shape27 a of the global shape model 27, based on a pre-defined valuespecified by the user, for example 200 pixels as inter-ocular distance.This procedure determines the size of the image where all thecomputations will be performed during training and tracking. Aconversion between shape model coordinates frame in unit space to theimage plane in pixel coordinates is performed. FIG. 12c schematicallyillustrates an example of scaling of the mean shape and FIG. 12dschematically illustrates an example of the resulting reference shape55. At step S14-5, the regression computation module 43 computes thesimilarity transformation between the reference shape 55 and theplurality of random shape initialisations 53.

At step S14-7, the regression coefficient module 43 performs imageprocessing on the selected training image 23 to transform the selectedtraining image 23 based on the reference shape 55 and the computedsimilarity transformation. In this embodiment, the similaritytransformation between the current estimate and the reference shape iscomputed through an iterative process aiming to minimize the distancebetween both shapes, by means of geometric transformations, such asrotation and scaling, to transform (or warp) the selected training image23. In the first iteration, just scaling has a role since the firstestimation is a scaled mean shape therefore, the rotation matrix willalways be an identity matrix. In further iterations, once the initialscaled mean shape has been modified by the refinement process, scale androtation will be of great importance. Subsequent regression coefficientmatrices will operate in transformed images which will be very closelyaligned with the reference shape. FIG. 12e shows examples of variousgeometric transformations that can be performed on respective trainingimages 23. Advantageously, image transformation in this embodiment isapplied globally to the whole image by means of a similaritytransformation, in contrast for example to piece-wise affine warping asemployed in AAM, whereby no deformation is performed and computationspeed is improved considerably.

At step S14-9, the regression computation module 43 calculates aconversion of the feature points 25 of the user-defined shape for theselected training image 23, to the corresponding locations for thelabelled feature points in the transformed image generated at step S9-9.At step S14-11, the regression computation module 43 calculates aconversion of the input shape, that is the random shape initializationas defined by the process S9-3 and the current estimated shape infurther iterations, to the corresponding feature locations in thetransformed image. At step S14-13, the offset between the calculatedconversions is determined by the regression computation module 43. Atstep S14-15, the regression computation module 43 determines a set ofBRIEF descriptors for the current estimated shape, derived from thecalculated conversion of the input shape feature points to thetransformed image. The determined BRIEF descriptor features andcorresponding offsets are stored by the regression computation module 43at step S14-17, for example in the training image database 5.

Returning to FIG. 13a , at step S13-5, the regression computation module43 determines if there is another training image 23 in the database 5 tobe processed and processing returns to steps S13-1 and S13-3 whereregression computation module 43 determines a corresponding set of BRIEFdescriptor features and corresponding offsets, based on each of theremaining, or a predetermined number of, training images 23 in thedatabase 5. Once all of the training images 23 have been processed inthis way, a regression coefficient matrix 45 for the global shape model27 is computed and stored for the trained object model 31 in the objectmodel database 7, taking as input all of the stored offsets and BRIEFfeatures determined from the training images 23.

Accordingly, at step S13-7, the regression computation module 43computes the regression coefficient matrix 45 for the input globalshape, based on the determined BRIEF features and corresponding offsets.In this embodiment, the regression computation module 43 is configuredto compute the regression coefficient matrix 45 using a regressionanalysis technique known as Principal Component Regression (PCR), whichreduces the dimensionality of the gathered BRIEF descriptors datasetbefore performing linear regression using least squares minimization inorder to get a regression coefficient matrix. Since the obtained matrixhas a dimension equal to the number of selected principal component, aconversion to the original dimensional space is efficiently computed. Asknown in the art, regression coefficient matrices are an optimal datastructure for efficient facial feature detection, for example asdiscussed in “Supervised Decent Method And Its Applications To FaceAlignment”, Xiong and Torre. It is appreciated that alternative knownregression analysis techniques may instead be used to compute theregression coefficient matrices, such as least squares regression, etc.

At step S13-9, the regression computation module 43 updates the globalshape model 27 of the current trained model 31 stored in the objectmodel database 7, by applying the respective trained functions definedby the computed global regression coefficient matrix 45 to the globalshape model 27. It will be appreciated that the computational processfor applying the cascading regression coefficient matrix to the inputshape is known per se and will depend on the specific regressionanalysis technique implemented by the system 1. At step S13-11, theregression computation module 43 processes the random shapeinitializations generated at step S10-1 above, to split each randomshape initialization into a respective set of estimated sub-shapes,according to the plurality of defined sub-shape models 29 in the objectmodel database 7. For example, referring to the exemplary shape model inFIG. 5, the defined subset of (x,y) coordinates for features of eachsub-shape 29 can be selected from each random shape initialization toobtain the respective estimated sub-shape.

The regression computation module 43 then processes the plurality ofcurrent sub-shapes 29 to generate a respective plurality of cascadingsub-shape regression coefficient matrices 47 for each current sub-shape29, based on the estimated sub-shapes obtained at step S13-11 and thetraining images 23 in the database 5. In this exemplary embodiment,three cascading sub-shape regression coefficient matrices 47 are definedfor each current sub-shape 29. It is appreciated that any number ofcascading levels can be defined. At step S13-13, the regressioncomputation module 43 selects a first sub-shape model, and computes andstores respective BRIEF descriptor features for each estimate sub-shapeof the current selected sub-shape model 29, and the corresponding offsetbased on the training images 23 in the database 5, at the currentcascade level.

Accordingly, at step S13-15, the regression computation module 43selects a first training image 23 and associated feature points 25 fromthe training image database 5 at step S13-15. At step S13-17, theregression computation module 43 selects a first one of the estimatedsub-shapes of the current selected sub-shape model 29. At step S13-19,the regression computation module 43 determines and stores BRIEFdescriptor features for the selected estimated sub-shape, as well as thecorresponding offsets, based on the current selected training image 23.At step S13-21, the regression computation module 43 determines whetherthere is another estimated sub-shape to process and if so, returns tostep S13-17 to select the next estimated sub-shape to be processed. Onceall of the estimated sub-shapes have been processed based on the currentselected training image 23 at the current cascade level, the regressioncomputation module 43 determines at step S13-23 whether there is anothertraining image 23 to process and if so, processing returns to stepS13-15 where BRIEF features and offsets data collection process isrepeated for the next training image at the current cascade level.

Once all, or a predetermined number, of the training images 23 have beenprocessed in the above way for the current cascade level, the regressioncomputation module 43 computes at step S13-25 a sub-shape regressioncoefficient matrix 47 for the current selected sub-shape, at the currentcascade level, based on all of the determined BRIEF features andcorresponding offsets. At step S13-27, the regression computation module43 updates all of the estimated sub-shapes, by applying the offsetsobtained from the respective trained functions defined by the currentcascade level sub-shape regression coefficient matrix 47, to thesub-shape model 27. At step S13-29, the regression computation module 43determines if there is another cascade level of the cascading sub-shaperegression coefficient matrices 47 to be generated, and if so, returnsto step S13-15 where the process is iteratively repeated for theremaining cascade levels.

After the regression computation module 43 determines at step S13-29that the current selected sub-shape model 29 has been processed in theabove manner for all of the predetermined cascade levels, then at stepS13-31, the regression computation module 43 determines if there isanother sub-shape model 29 to process and returns to step S13-13 toselect the next sub-shape 29, and to subsequently compute the cascadingregression coefficient matrices 47 for the next selected sub-shape 29and update the next sub-shape 29, until all of the sub-shapes 29 havebeen processed and updated by the shape model training module 3 asdescribed above.

Tracking Process

The tracking process performed by the tracking module 11 in the system 1will now be described in more detail with reference to FIG. 15, whichshows the steps of an example computer-implemented tracking process inanother embodiment of the present invention. Reference is also made toFIGS. 18a to 18e , illustrating an example sequence of user interfacedisplay screens during the tracking process.

As shown in FIG. 15, at step S15-1, the tracking module 11 may performan initialisation sub-process based on received data of an initialcaptured image from the camera. One example of this processing isdescribed in more detail with reference to FIG. 16. As shown in FIG. 16,the process starts with the supply of a camera feed at a step D1. Thecamera captures a (video) image of the user, and displays this to theuser, for example on a tablet computer which the user is holding. Anoverlay is also shown on screen, which might for example comprise anoutline or silhouette of a person's face. The user is required to alignthe image of their face with the overlay at a step D2. An example of thedisplayed image overlay is shown in the representation provided to theleft of the step D2.

At a step D3, a face detection step is carried out, which might forexample use Haar-like features (discussed for example in “Zur Theorieder orthogonalen Funktionensysteme”, Haar, Alfred (1910), 69(3):331-371). These Haar-like features can be used to pick out the locationand scale of the face in the image. An example of this, in which thelocation of the detected face is identified by a bounding rectangle, isshown in the representation provided to the left of the step D3. At astep D4 it is determined whether or not the face has been detected. Ifthe face has not been detected, then processing cannot go any further,and the process returns to the step D2, for the user to realign theirface with the overlay. If the face has been detected, then at a step D5a mouth detection step is carried out, which might again for example useHaar-like features—this time to pick out the location of the mouth. Inorder to improve processing efficiency, the search for the mouth can beconstrained to lower part of the bounding rectangle already found forthe face. An example of a detected mouth area is shown in therepresentation provided to the left of the step D5. At a step D6, it isdetermined whether or not the mouth has been detected. If the mouth hasnot been detected, then processing cannot go any further, and theprocess returns to the step D2, for the user to realign their face withthe overlay.

If the mouth has been detected, then at a step D7 a process of buildingforeground and background histograms is carried out. Foreground refersto the target area to be detected for example lip regions and backgroundrefers to the area to be excluded from the foreground for instance skinregions. The foreground and background histograms are populated with afrequency of colour values occurring in different regions of the image.These regions are defined, for example, by a mask created with the faceas background and the mouth as the foreground, as discussed above. Insome embodiments one or more histogram updates might be carried outusing the same source image and the same mask. The foreground/backgroundhistogram building process uses as an input a version of the camerafeed, which may be converted from the camera image data colour space(e.g. RGB/RGBA) to a working colour space (e.g. YCrCb), at a step D10.The input colour format depends on the camera installed in the deviceemployed by the user. It is appreciated that the YCrCb colour space isuseful, since the histogramming can be carried out in two dimensions byignoring luminance (Y) and utilising only the colour difference valuesCr and Cb.

The step D7 comprises a sub-step D7 a of providing exclusive histogramupdates based on a face area (background/skin) provided at a step D11and a mouth area (foreground/lips) provided at a step D12. By exclusiveit is meant that updates in the foreground histograms by foregroundmasks increases the frequency of the corresponding colour but updatesthe background histogram as well by decreasing the frequency of thatsame colour. In other words, if the colour belongs to the foreground itcan not belong to the background. Therefore, the update of any colourcoming from background or foreground produces effects in bothhistograms. The representation visible between the steps D10 and D11illustrates the mouth area (white—foreground), and the face area(black—background) employed in the exclusive histogram updates step D7a. At a step D7 a 1, a background histogram is updated with thefrequency of occurrence of each colour value within the face area (butoutside of the mouth area). Similarly, at a step D7 a 2, a foregroundhistogram is updated with the frequency of occurrence of each colourvalue within the mouth area. The next steps which take place in thehistogram building procedure D7 are meant to improve the quality of thegenerated histograms.

The background histogram, foreground histogram, and the converted imagedata are provided to a probability map computation step D7 b, which forinstance uses a Bayesian framework (or similar statistic technique) todetermine the probability of a particular pixel belonging to the lips(foreground) by means of the foreground and background histograms. Anexample of such a probability map is shown to the right of the step D7b. The probability map computation can be calculated using Bayesianinference to obtain the posterior probability according to Bayes' rule,demonstrated below:

${P\left( A \middle| B \right)} = {\frac{{P\left( B \middle| A \right)}{P(A)}}{P(B)} \propto {{P\left( B \middle| A \right)}{P(A)}}}$

The probability of a pixel with colour (Cb,Cr) of belonging to theforeground (or being lip) can be computed as follows:

${P\left( {{Cb},{Cr}} \right)} = \frac{P\left( {\left. {lip} \middle| {Cb} \right.,{Cr}} \right)}{{P\left( {\left. {lip} \middle| {Cb} \right.,{Cr}} \right)} + {p\left( {\left. {nonlip} \middle| {Cb} \right.,{Cr}} \right)}}$where P(lip|Cb, Cr) = P(Cb, Cr|lip) ⋅ P(lip)P(nonlip|Cb, Cr) = P(Cb, Cr|nonlip) ⋅ P(nonlip)

The conditional probabilities are calculated by means of the statisticsstored in the histogram building procedure employed as follows:

${P\left( {{Cb},\left. {Cr} \middle| {lip} \right.} \right)} = \frac{{foregroundHistogram}\mspace{11mu} \left( {{Cb},{Cr}} \right)}{numLipPixels}$${P\left( {{Cb},\left. {Cr} \middle| {nonlip} \right.} \right)} = \frac{{backgroundHistogram}\mspace{11mu} \left( {{Cb},{Cr}} \right)}{numNonLipPixels}$${P({lip})} = \frac{numLipPixels}{numTotalPixels}$${P({nonlip})} = \frac{numNonLipPixels}{numTotalPixels}$

Once the probability map of being lip has been computed around the moutharea, the result will be used in order to reinforce the histogramquality through a clustering process which will produce a finersegmentation of the lip area.

At a step D7 c, cluster centres for background and foreground areinitialised in CbCr colour space. The background cluster centre iscomputed with colour values corresponding to pixels within theprobability map (and thus constrained to the mouth area) which have anassociated probability of less than a predetermined threshold value—forexample a value of 0.5 in the case of a probability range of 0 to 1. Theforeground cluster centre is calculated with colour values correspondingto pixels within the probability map (and thus constrained to the moutharea) which have an associated probability higher than the predeterminedthreshold value. The cluster centre for each of these is determined asthe centre of gravity of all of the points belonging to foreground orbackground.

An example of the initialization of the clustering procedure, showingthe two cluster centres, is visible in the representation to the left ofand slightly above the step D7 c. Here it can be observed colour valuesdetected as background as light grey colour and foreground pixels asdark grey tone. This figure represents the probability map, shown in therepresentation on the right on the process D7 c, expressed in thecolour-space CbCr. It is noticeable that the amount of pixels belongingto the foreground is very spare and indeed difficult to appreciate inthe figure; however good enough to give an accurate approximation ofwhere the centre of the cluster might be. This proximity of the clustersis due to the high similarity between skin and lip colour. In the caseof selecting skin as foreground and any other colour as background, theclusters will be much further apart and the situation will be easier toovercome. This is an extreme example which proves the success of thealgorithm.

At a step D7 d, a fuzzy c-means clustering algorithm is used toassociate the colour values in the CbCr space observed in the mouth areawith the closest cluster centre. This can be carried out by determiningthe degree of membership of each colour value to the foreground clustercentre. This would effectively shift certain colour values frombelonging to one cluster to belonging to the other cluster. An exampleof the reordering provided by this process is visible in therepresentation provided to the left of and slightly above the step D7 d.The output of this process generates an equivalent probability map tothat generated from the original histogram data but it should show amuch stronger lip structure, as visible in the representation providedbeneath the cluster representations. It should be noted that only asingle pass of the fuzzy c-means clustering algorithm is carried out (noiteration). There is no re-computation of the cluster centres. This isbecause the clusters are too close together and many/further iterationsmight cause misclassifications.

The fuzzy c-means clustering may be carried out by minimising thefollowing objective function:

${J_{m} = {\sum\limits_{i = 1}^{N}{\sum\limits_{j = 1}^{C}{u_{ij}^{m}{{x_{i} - c_{j}}}^{2}}}}},$

where 1≤m≤∞ and uij is the degree of membership of xi (CbCr value) inthe cluster j.

${u_{ij} = \frac{1}{\sum\limits_{k = 1}^{C}\left( \frac{{x_{i} - c_{j}}}{{x_{i} - c_{k}}} \right)^{\frac{2}{m - 1}}}},$

where m (fuzziness)=2

$c_{j} = \frac{\sum\limits_{i = 1}^{N}{u_{ij}^{m} \cdot x_{i}}}{\sum\limits_{i = 1}^{N}u_{ij}^{m}}$

After the computation of step D7 d, an exclusive histogram update stepD7 a reinforce the content of the histograms based on the output of theclustering stages. In particular, the background histogram is populatedwith the frequency of occurrence of colour values in the background(face area)—i.e. associated with the background cluster, while theforeground histogram is populated with the frequency of occurrence ofcolour values in the foreground (lip area)—i.e. associated with theforeground cluster. The representation to the left and above the step D7f shows the regions employed for the histogram updates where thebackground is the face area and the new strongly defined lip area formsthe foreground. Following the histogram building step, at a step D8 itis determined whether a sufficient number of initialisation frames havebeen processed for the completion of the histogram building process. Ifless than N frames were processed then the process returns to the stepD2, where the user is required to maintain facial alignment with theoverlay, and the process of face/mouth detection, histogramming andclustering starts again.

The histograms are accumulated in this way over several frames,improving the robustness of the foreground and background histograms.When at the step D8 it is determined that the threshold number ofinitialisation frames has been reached, the initialisation processfinishes, and the initialised histograms are carried through into thenext stage of real-time processing. At this stage the displayed overlaycan be removed from the display. It should be understood that while thehistogram does not need updating every frame during the trackingprocess, it is desirable to update the histogram periodically, forexample to account for lighting changes. The reinforcement of thehistograms can takes place after the initialization and during thetracking procedure in order to overcome situations in which the userexperiences changes in the scene such as lighting which affects directlyto colour features.

Returning to FIG. 15, at step S15-2, the initialised tracking module 11receives captured image data from the camera 9, which can be an image ina sequence of images or video frames. At step S15-3, the tracking moduledetermines if an object, a subject's face in this exemplary embodiment,was previously detected and located for tracking in a prior image orvideo frame. In subsequent iterations of the tracking process, thetracking module 11 may determine that the object was previously detectedand located, for example from tracking data (not shown) stored by thesystem 1, the tracking data including a determined global object shapeof the detected object, which can be used as the initialised globalobject shape for the current captured image. As this is the first timethe tracking process is executed, processing proceeds to step S15-5where the captured image data is processed by the object detector module42 to detect an object in the image and to output a bounding box 51 ofan approximate location for the detected object. At step S15-7, thetracking module 11 initialises the detected object shape using thetrained global shape model 27, the statistics computed at step S8-11above, and the corresponding global shape regression coefficient matrix45 retrieved from the object model database 7, based on the image datawithin the identified bounding box 51. FIG. 18a shows an example of aninitialised object shape 71 within the bounding box 51, displayed overthe captured image data 73. The trained object model may be generated bythe shape model training module 3 as described by the training processabove. As shown, the candidate object shape at this stage is an initialapproximation of the whole shape of the object within the bounding box51, based on the global shape model 27. Accordingly, the location andshape of individual features of the object, such as the lips and chin inthe example of FIG. 18a , are not accurate.

At step S15-9, the tracking module 11 performs processing to refine theinitialised global object shape using the trained sub-shape models 29and its corresponding cascading regression coefficient matrices 47 foreach sub-shape model 29. This processing is described in more detailwith reference to FIG. 17. As shown in FIG. 17, at step S17-1, therefinement process starts with the tracking module 11 computing andadjusting the nearest shape fitting the global shape model. Theweighting of the eigenvectors or parameters of the model for thecomputed plausible shape should be contained in the scope of validshapes. A valid shape is defined to have their parameters between someboundaries. Given the shape computed in the previous frame, it ischecked if the output from the sub-shape regression coefficient matricescomputed independently fits the global shape model definition beforeproceeding further. Accordingly, at step S17-3, it is determined if thepercentage of parameters out of boundaries is greater than a predefinedthreshold a. In the positive case, tracking of the object is consideredto be lost. If so, the refinement process is terminated and processingmay return to step S15-1 where a new captured image is received from thecamera for processing. Otherwise, the refinement module 61 proceeds toadjust the object shape to fit the global shape model 27, at step S17-3.

At step S17-5, the refinement module 61 computes a similaritytransformation between the adjusted shape and the reference shapedefined in S9-5. At step S17-7, the captured image is transformed basedon the computed similarity transformation. At step S17-9, the refinementmodule 61 calculates a conversion of the adjusted shape to thetransformed image. FIG. 18b shows an example of the refined, adjustedglobal object shape 71 a displayed over the captured image data 73. Atstep S17-11, the refinement module 61 determines a plurality ofcandidate sub-shapes from the current adjusted global shape, based onthe sub-shape models 29 as discussed above. The candidate sub-shapes arethen updated by iteratively applying the corresponding cascadingsub-shape regression coefficient matrices 47 to the sub-shape, startingwith the highest, most generalised cascade level.

Accordingly, at step S17-13, the refinement module 61 selects a first ofthe candidate sub-shapes. The refinement module 61 then determines atstep S17-15 a BRIEF descriptor for the candidate sub-shape, based on thetransformed image at the current cascade level. At step S17-17, therefinement module 61 updates the current candidate sub-shape based onthe corresponding sub-shape regression coefficient matrix 47 at thecurrent cascade level, retrieved from the object model database 7. Asdiscussed above, this updating step will depend on the particularregression analysis technique implemented by the system 1 to apply thetrained function defined by the sub-shape regression coefficient matrix47 to the sub-shape data values. At step S17-19, the refinement module61 determines if there is another candidate sub-shape to process andreturns to step S17-13 to select the next sub-shape to be processed atthe current cascade level. Once all of the candidate sub-shapes havebeen processed at the current cascade level, the refinement module 61determines at step S17-20 if there is another cascade level to process,and processing returns to step S17-13 where the sub-shape refinementprocess is repeated for the next cascade level. FIGS. 18c and 18d showexamples of respective sequences of refinement of the two objectsub-shapes 75-1, 75-2, displayed over the captured image data 73.

When it is determined at step S17-20 that all of the sub-shapes havebeen processed for all of the cascade levels of the sub-shape regressioncoefficient matrices 47, then at step S17-21, the refinement module 61checks if a predefined accuracy threshold needs to be met by the refinedsub-model, for example a two pixel accuracy. It will be appreciated thatapplying an accuracy threshold is optional. If the accuracy is notwithin the pre-defined threshold, then processing proceeds to stepS17-23 where the refinement module 61 determines if the percentage ofeigenvector weights is under a second pre-defined limit b in sub-modelparameters. If not, the refinement process is terminated and processingproceeds to step S15-11 discussed below. On the other hand, if it isdetermined at S17-21 that the pre-defined accuracy threshold needs to bemet, then at step S17-25, the refinement module 61 performs processingto refine the corresponding sub-shape appearance and combined models 35,39. For example, the sub-shape appearance model 35 can be refined usingknown AAM techniques. At step S17-27, the refinement module 61 convertsthe refined sub-shapes 29 back to the original image from the referenceimage coordinate frame, and brings together the respective separate datastructures for the previously split candidate sub-shapes, back into aglobal shape framework. FIG. 18e shows an example of the further refinedglobal object shape 71 a displayed over the captured image data 73, as aresult of the refinement of the object sub-shapes 75, which is moreefficient and accurate than carrying out further refinement of theglobal object shape 71.

After the object refinement process is completed, processing proceeds tostep S15-10 in FIG. 15, where the tracking module 11 determines whetherrefinement of the detected object sub-shapes within the acceptableparameters was successfully achieved at step S15-9. If not, for exampleif it was determined at step S17-3 or step S17-23 that tracking of theobject was lost, then processing can return to step S15-1, where a newcaptured image is received from the camera for processing in a newiteration by the tracking module 11.

Otherwise, if the tracking module 11 determines that acceptablesub-shape refinement was achieved by the processing at step S15-9, thenat step S15-11, the tracking module 11 optionally applies an exponentialsmoothing process to the object shape, based on the object shapedetected on the previous frame when available. Exponential smoothing canbe carried out on the estimated object shape data in order to producesmoothed data for presentation purposes, based on the followingexemplary equation:

s _(t) =αx _(t)+(1−α)s _(t-1)

where st−1 is the previous object shape determined from the previousframe, st is the smoothed version of the current estimated object shapext, and a is a weighting value which is adapted automatically duringruntime. It will be appreciated that this smoothing techniqueadvantageously provides for improved visualisation of the estimatedshape(s), therefore forecasts need not be obtained to make predictionsof where the object might be in the next frame. The complex environmentswhere the invention aims to operate includes unknown lightingconditions, movements of both the camera and the object to trackoccasioning very complicated motion models and no ground truth of thereal position or measurement to be used in the update step in morecomplicated strategies for tracking such as Kalman filtering.

In this way, a robust, accurate and efficient technique for locating andtracking sub-aspects, such as facial features of a global detectedobject, such as a subject's face, is provided. A number of advantageswill be understood from the above description of the embodiments of thepresent invention. In particular, the tracking technique is efficientand robust to more generalized object models, by obtaining an initialrough estimate of a candidate global shape using the trained globalshape model, and subsequently refining the respective candidatesub-shapes of the candidate global shape by applying the correspondingsub-shape regression coefficient matrices to obtain the displacementswhich leads to accurate positions of the object features to track.Therefore, the global shape model and corresponding regressioncoefficient matrix is applied only once to the image data, and eachsubsequent iteration of the refinement sub-process involves asignificantly lesser amount of data, due to the provision of sub-shapemodels defined by subsets of feature points of the global shape andcomputation using the corresponding reduced size regression coefficientmatrices, making the technique suitable to be used for real-timeapplications, particularly in computing devices with reduced hardwarecapabilities, such as limited memory and/or processor resources.

Colourisation and Augmentation

The colourisation process performed by the colourisation module 13 inthe system 1 will now be described in more detail with reference to FIG.19, which shows the steps of an example computer-implementedcolourisation process in another embodiment of the present invention.Reference is also made to FIG. 20, showing examples of data that isprocessed by, and processing steps performed by the colourisation moduleduring the colourisation process. As shown in FIG. 19, at step S19-1,the colourisation module 13 selects a first set of colourisationparameters 26 from the colourisation parameters database 28. At stepS19-3, the colourisation module 13 retrieves the texture model 16 andthe texture data file 19 associated with the selected set ofcolourisation parameters 26.

In the illustrated example of FIG. 20, four texture models 16 areretrieved from the object model database 10, each with a respectivedifferent mask 14 b and optimised mesh 18. Each retrieved texture model16 is selected based on a corresponding set of colourisation parameters26. A first mask 14 b-1 defines a masked lip region of the referenceimage 8 and is associated with a first optimised mesh 18-1 definingpolygonal areas around the masked lip region. A second mask 14 b-1defines two masked eye regions of the reference image and is associatedwith a second optimised mesh 14 b-2 defining polygonal areas around themasked eye regions. A third mask 14 c-1 defines two masked cheek regionsof the reference image 8 and is associated with a third optimised mesh18-3 defining polygonal areas around the cheek regions. A fourth mask 14b-4 defines a masked skin region of the reference image and isassociated with a fourth optimised mesh 14 b-4 defining polygonal areasof the masked skin region.

At step S19-5, the colourisation module 13 selects a first region of theoptimised mesh 18 from the retrieved texture model 16. At step S19-7,the transform module 20 determines a set of transformation values bymapping the coordinates of the vertices of the selected region to thelocation of the corresponding tracked feature point determined by thetracking module 11. At step S19-9, the transform module 20 retrieves thecorresponding region of texture data 19, again as referenced by thevertices of the selected region, and applies the transformation to theretrieved region of texture data to generate a corresponding warpedtexture data region. Optionally, the transform module 20 may alsoretrieve the corresponding region of mask data 14 b, as defined by thevertices of the selected region, and apply the transformation to theretrieved masked data to generate corresponding warped masked data forthe selected region. At step S19-11, the colourisation module 13 appliesone or more image colourisation adjustments to the warped texture dataregion using the shader module 22 as defined by the shader valueparameter 26-3. As will be described below, the shader modules 22 mayoptionally take into account the warped mask data region, depending onthe particular shader sub-modules that are used.

At step S19-13, the colourisation module 13 determines if there isanother region of the optimised mesh 10 to be processed, and if so,processing returns to step S19-5 where the next region is selected forprocessing as discussed above, until all of the regions of the optimisedmesh 18 have been processed in this way. At step S19-17, thecolourisation module 13 then determines if there is another set ofcolourisation parameters 26 to be processed for the current capturedimage frame. If so, processing returns to step S19-1 where the next setof colourisation parameters 26 is selected and processed as discussedabove, until all of the sets of colourisation parameters 26 have beenprocessed in this way.

At step S19-19, the renderer 24 retrieves and overlays all of theoptimised meshes 18 as a sequence of layered data to be applied to thecaptured image data. This is schematically illustrated at S20-1 in FIG.21. At step S19-21, the renderer 24 performs an alpha blend of theadjusted texture data regions associated with each of the layeredoptimised meshes 18, as output by the respective shader modules 22. FIG.21 shows an example of the blended result at S20-2. The renderer 24 thenoverlays the blended results on the original captured image data foroutput to the display 15, at step S19-23. FIG. 21 shows an example ofthe resulting augmented image data at S20-3.

It will be appreciated that this is just one exemplary sequence ofprocessing steps to retrieve the respective regions of texture data 19defined by image coordinates corresponding to the vertices of the maskedregions defined by the optimised mesh 18. As one alternative, thecolourisation module 13 may be configured to determine a set oftransformation values by mapping all of the vertices of the normalisedmesh 10 as a whole to the respective corresponding labelled featurepoints of the tracking data, whereby the determined transformationvalues can be applied to each region of texture data and mask data asdiscussed above. FIG. 20 schematically illustrates an exemplary sequenceof data that may be processed by, and processing steps performed by, thetransform module 20 to determine transformation of mesh data. In theillustrated example, the reference image 8 and detected tracking featurepoint data 25 can be combined with the normalised mesh 10, to produce asingle mesh including the coordinates of the vertices from the trackeddata 25 and the coordinates of the vertices from the normalised mesh 10.The vertices from the normalised mesh 10 are mapped to the vertices ofthe tracked data 25, to determine respective transformation values basedon the respective coordinates for each corresponding pair of vertices,for example in terms of translation in the two-dimensional plane. Theresulting transformation values can be illustrated as a morphed result,which can be subsequently applied to at least a portion of a mask data14 b and texture data 19, as described above.

The resulting augmented image with the applied texture and colourisationis output at step S12-15 for example on display 15. At step S12-17, thetracking module 11 determines if there is a new captured image frame toprocess when processing returns to step S12-3 to continue tracking ofthe object from the last/previous frame, from step S12-9 onwards.

Shader Modules

FIG. 22, which comprises FIGS. 22a to 22d , schematically illustrateexemplary shader modules 22 and respective processes for applyingcolourising adjustments, as set out in step S18-13 above, to identifiedportion(s) of associated texture data. Each shader module 22 is definedby a predetermined set of shader sub-modules 32 for performingrespective adjustments to the texture image data and/or captured imagedata, optionally taking into account properties 26-1 of the present setof colourisation parameters 26.

FIG. 22a illustrates a first example of a lip shader module 22-1 forapplying colourisation to a portion of the captured image data based ona corresponding portion of a lipstick detail texture 20-1. In thisexample, a lip mask 14 b-1 defines the masked portion as the lips of aface in the captured image data, for example as shown in FIG. 7d . At astep G1, the warped region of the lipstick detail texture data file 19-1is provided. This is a predetermined lip image 20-1 warped into theshape of the detected object in the captured image frame, and carrying atexture such as glossy or matte. At step G2, the captured image datafrom the camera 9 is provided, in which the user's face will typicallybe visible. At step G7, a highlight adjustment shader sub-module 32-1uses the lipstick detail texture 20-1 and captured image data to performa blend operation in a highlight adjustment stage. This blend operationserves to average (per pixel) the luminance of the lipstick detailtexture and captured image data. This adds additional detail to thecaptured image data which may in some cases show quite featureless lips.For example, the operation can be applied on a per channel basis for theinput pixels a, b, across the red, blue and green channels, as follows:

a ∈ [r, g, b], b ∈ [r, g, b]${f_{softlight}\left( {a,b} \right)} = \left\{ \begin{matrix}{{{2{ab}} + {a^{2}\left( {1 - {2b}} \right)}},} & {{{if}\mspace{14mu} b} < 0.5} \\{{{2{a\left( {1 - b} \right)}} + \sqrt{a\left( {{2b} - 1} \right)}},} & {otherwise}\end{matrix} \right.$

This is followed by a greyscale conversion step G8 to convert thecombined output of the captured image data and lipstick detail texture20-1 (output of step G7) into greyscale. For example, this can becalculated as a weighted sum of the colour channels, with weights set tobest match the human perception of colour, as follows:

f _(grayscale)(r,g,b)=0.2125·r+0.7154·g+0.0721·b

Then, the exposure of the output of the step G8 is adjusted at a stepG9, based on an exposure property 26 a-2, to influence the brightnesslevel at which highlight features would be added to the lip texture, andhas the effect of nonlinearly increasing or decreasing the input value.For example, exposure can be computed as:

f _(exposure)(x,n)=x·2^(n)

As discussed above, the various properties taken into account by theshader sub-modules in this process can be defined by the presentselected set of colourisation parameters 26.

Similarly, at a step G10 the gamma of the greyscale image is adjusted,using a gamma property 26 a-3, for the same reasons as the step G9. Theresult of G9 and G10 may be a pixel value which has either beenemphasised (brightened) or diminished (reduced in brightness). G10 hasthe effect of nonlinearly adjusting the greys of an image eitherboosting or diminishing their output value without adjusting eithercomplete white or complete black as follows:

${f_{gamma}\left( {x,g} \right)} = x^{\frac{1}{g}}$

A multiply shininess step G11 then modifies the shininess of thegreyscale image/texture based on a shininess property 26 a-4. In otherwords, the step G11 linearly modulates the pixel value to inhibit harshlighting effects. The resulting output of the highlight adjustment stageis passed to a first step of a blend colour adjustments stage. Thepurpose of the steps G9 to G11 is to emphasise existing areas ofbrightness in the final augmented lip texture. The resulting output ofthe highlight adjustment sub-module 32-1 is passed to a first processingstep of a blend colour adjustment shader sub-module 32-2.

At a step G12, a lip colour adjustment shader sub-module 32-3 performs agreyscale operation on the captured image data as a first step toconvert incoming pixel colour values into greyscale. Then, at a step G13the greyscale image is blended with a lip colour property 26 a-1(selected lip colour property—from a step G3) to form an overlay. Theresulting output of the lip colour adjustment sub-module 32-3 is alsopassed to the blend colour adjustment shader sub-module 32-2.

a ∈ [r, g, b], b ∈ [r, g, b]${f_{overlay}\left( {a,b} \right)} = \left\{ \begin{matrix}{{2{ab}},} & {{{if}\mspace{14mu} a} < 0.5} \\{1 - {2\left( {{1 - {a\left( {1 - b} \right)}},} \right.}} & {otherwise}\end{matrix} \right.$

Meanwhile, at a step G4 a static noise texture, such as a simpleGaussian noise, is provided as a 2D image. A glitter texture is providedat a step G5 (Gaussian noise, and again a 2D image, but in this casewarped to the shape of the lips/model). Optionally, an appearance modeltexture may be provided as input for further colour adjustment, forexample to a Gaussian blur at a first step G14 of a glitter adjustmentshader sub-module 32-4 to soften the edges of the lip model texture. Theblurred model, and the static and warped textures may be passed to amultiply step G15 in combination with a glitter amount property 26 a-5.The textures are multiplied together (weighted by the glitter amountproperty 26 a-5) so that the pixel values (greyscale) of spatiallycorrelated pixels with the respective 2D images are multiplied together.When the lips (and the model) move, the warped texture will move withrespect to the static texture, causing a sparkling effect on the lips.The resulting output of the glitter adjustment sub-module 32-4 is alsopassed to the blend colour adjustment shader sub-module 32-3.

At a step G18, the outputs of the steps G11, G13 and G15 are addedtogether in the first step of the blend colour adjustment shadersub-module 32-2. At a step G16, a lighting model adjustment sub-modulecomputes a lighting model adjustment by linearly interpolating theblurred appearance model texture based on a 50% grey level set at a stepG17 and a lighting property 26 a-6 (which controls how much influence isprovided by the output of the appearance model, and how much influenceis provided by the fixed grey level). The overlay generated at the stepG18 is then blended with the lighting model by the blend colouradjustment sub-module 32-2, at a step G19. The purpose of the lightingmodel adjustment is to emphasise the detail taken from the appearancemodel texture, while controlling the level of influence this has (usingthe lighting property 26 a-6 and G17 grey level) so as not to produceharsh, dominating effects. The output of the step G19 is then furtherlinearly interpolated based on alpha value of the lip colour property 26a-1 (to control the balance between the original input image and theaugmented overlay) and the captured image at a step G20.

f _(lerp)(a,b,w)=a+w(b−a)

At a step G21, an alpha blend adjustment sub-module 32-6 applies aGaussian blur operation to soften the edges of the lip mask data 14 b-1(defining which parts of an image are lip and which are not) at stepG21, and then at a step G22 is used to perform an alpha blend stage withthe adjusted overlay, received from the blend colour adjustmentsub-module 32-2, and the captured image data.

f _(alphablend)(a,b,w)=(a·w)+b·(1−w)

Advantageously, this prevents the colourisation from being appliedoutside the lip region of the input image, and softens the colourisationat the boundary of the lips. In summary, the overall computed highlightintensity calculated by this exemplary lip shader module 22-1 is asfollows:

-   -   Highlight Adjustment

CH=Gamma(Exposure(Greyscale(BlendSoftLight(WC,LD)),EP),GP)*SP

-   -   where CH is the computed highlight intensity, WC is the captured        image pixel colour, LD is the Lipstick Detail Texture pixel        colour, EP is the Exposure Property 25 a-2, GP is the Gamma        Property 26 a-3, and SP is the Shininess Property 26 a-4.    -   Lip Colour Adjustment

CC=Overlay(LC,Greyscale(WC))

-   -   where CC is the computed lip colour, and LC is the Lip Colour        Property 26 a-1.    -   Glitter Adjustment:

CG=GT*NT*Guassian(AM)*GA

-   -   where CG is the computed glitter intensity, NT is the Static        Noise Texture pixel colour, GT is the Glitter Texture pixel        colour, AM is the Appearance Model pixel colour, and GA is the        Glitter Amount Property 26 a-5.    -   Lighting Model Adjustment

CL=Lerp(0.5,AM,LP)

-   -   where CL is the computed lighting model intensity, and LP is the        Lighting Property 26 a-6.    -   Blend Colour Adjustments

BC=Lerp(WC,Overlay(CC+CH+CG,CL))

-   -   where BC is the blended colours adjustments.    -   Alpha Blend Adjustment

OT=AlphaBlend(BC,WC Guassian(LM))

-   -   -   where OT is the ‘Output Texture’ pixel colour, and LM is the            ‘Lip Mask Texture’ pixel colour.

FIG. 22b illustrates a second example of a lip shader module 22-2 forapplying colourisation to a portion of the captured image data, based ona corresponding portion of a lipstick detail texture 20-1. As in thefirst example, the lip mask 14 b-1 defines the masked portion as thelips of a face in the captured image data. However, in this example, thelip stick shader module 22-2 is configured to use a different set ofshader sub-modules 32 than the first example above. Additionally,instead of applying the alpha blend to the captured and adjusted imagedata, an adjusted colour value for each pixel is output as the resultingcolourised texture data along with a corresponding calculated alphavalue for each pixel. Accordingly, as shown in FIG. 22b , an alpha blendcalculation sub-module 32-7 calculates the respective alpha blend valuesfor the output texture portion by first receiving output data from ahighlight adjustment sub-module 32-1 and a glitter adjustment sub-module32-4, and adding the received data together at a step G18 based on anintensity property 26 a-7. The output of step G18 is then additivelymultiplied with data of the warped portion of the lip mask 14 b-1 atstep G15, and further processed in a subsequent saturation step G19. Theintensity property 26 a-7 is also used by the glitter adjustmentsub-module 32-4 as a further parameter to control the glitteradjustment.

A colour adjustment sub-module 32-3 is used to apply the lip colourproperty 26 a-1 to a greyscale version of the portion of the capturedimage data to determine the colour values for the output texture. Inthis example, the colour adjustment sub-module 32-3 is configured toapply a “hard light” blend at a modified step G13, to combine the lipcolour property 26 a-1 with the greyscale captured image data. Forexample, the operation can apply the property b to each input pixel a asfollows:

${f_{hardlight}\left( {a,b} \right)} = \left\{ \begin{matrix}{{2{ab}},} & {{{if}\mspace{14mu} b} < 0.5} \\{{1 - {2\left( {1 - a} \right)\left( {1 - b} \right)}},} & {otherwise}\end{matrix} \right.$

FIG. 22c illustrates an example of a foundation shader module 22-3 forapplying colourisation to another portion of the captured image data,based on a corresponding warped portion of a face mask 14 b-2. In thisexample, the face mask 14 b-2 defines the masked portion as the skinportion of a face in the captured image data, for example as shown inFIG. 7d . As shown in FIG. 22c , a blend colour adjustment sub-module32-2 linearly interpolates the captured image data from the camera 9with a blurred version of the captured image data, based on the weightedoutput of a smooth mask sub-module 32-7. The smooth mask sub-module 32-7performs processing at a step G18 to add the face mask data 14 b-2 witha ramped greyscale version of the captured image data, based on anintensity property 26 a-7 and a smooth property 26 a-8, and adjusts thesaturation of the output at a step G19.

FIG. 23 schematically illustrates an example process for generating ablurred version of the captured image data, which is particularlyoptimal in the context of applying virtual foundation make-up in atracking and augmenting system 1. As shown in FIG. 23, a blurringsub-module 32-8 receives the captured image data from the camera 9. At astep B3, the captured image data is blurred by downsampling the inputimage data to a lower resolution. At a step B4, a threshold function isapplied to the pixel values of the captured image data, for example by afunction:

f(x)=Greyscale(x){circumflex over ( )}2.5*5.0

At a step B5, the thresholded image data is multiplied by the face mask14 b-2, retrieved from the object model database 7, to discard pixelsoutside the masked face region. At a step B6, the blurred image data ismixed with the result of step B5, resulting in the discarding of pixelsoutside the masked face region and discarding of dark features from theinput captured image data. At a step B7, the result of step B6 is alphablended with the original captured image data. Advantageously, theblurring sub-module 32-8 outputs a resulting image with softened facialfeatures, while maintaining sharp facial features. Although the blurringprocess in FIG. 23 is described as applied to the entire image ascaptured by the camera 9, it is appreciated that the blurring processcan be applied to just the masked region of the captured image data forimproved efficiencies.

FIG. 22d illustrates an example of a blusher and eye shadow shadermodule 22-4 for applying colourisation to yet other portions of thecaptured image data, based on a corresponding portion of a blusher oreye mask 14 b-3. In this example, the blusher or eye mask 14 b-3 definesthe masked portion as the cheek or eye portions of a face in thecaptured image data, for example as shown in FIG. 7d . As shown in FIG.22d , the colour values of the output texture portion are calculated byapplying adjustments to the corresponding portion of the captured imagedata using the colour adjustment sub-module 32-3 and the blend douradjustment module 32-2, similarly to the examples discussed above. Thealpha blend calculation sub-module 32-7 calculates the correspondingalpha values for the output texture portion, based on the receivedoutput from the glitter adjustment sub-module 32-4, an intensityproperty 26 a-7, and the warped region of the blush or eye mask data 14b-3, in a similar manner as the examples discussed above.

Computer Systems

The modules described herein, such as the training, tracking andcolourisation modules, may be implemented by a computer system orsystems, such as computer system 1000 as shown in FIG. 24. Embodimentsof the present invention may be implemented as programmable code forexecution by such computer systems 1000. After reading this description,it will become apparent to a person skilled in the art how to implementthe invention using other computer systems and/or computerarchitectures.

Computer system 1000 includes one or more processors, such as processor1004. Processor 1004 may be any type of processor, including but notlimited to a special purpose or a general-purpose digital signalprocessor. Processor 1004 is connected to a communication infrastructure1006 (for example, a bus or network). Various software implementationsare described in terms of this exemplary computer system. After readingthis description, it will become apparent to a person skilled in the arthow to implement the invention using other computer systems and/orcomputer architectures.

Computer system 1000 also includes a user input interface 1003 connectedto one or more input device(s) 1005 and a display interface 1007connected to one or more display(s) 1009. Input devices 1005 mayinclude, for example, a pointing device such as a mouse or touchpad, akeyboard, a touchscreen such as a resistive or capacitive touchscreen,etc. After reading this description, it will become apparent to a personskilled in the art how to implement the invention using other computersystems and/or computer architectures, for example using mobileelectronic devices with integrated input and display components.

Computer system 1000 also includes a main memory 1008, preferably randomaccess memory (RAM), and may also include a secondary memory 610.Secondary memory 1010 may include, for example, a hard disk drive 1012and/or a removable storage drive 1014, representing a floppy disk drive,a magnetic tape drive, an optical disk drive, etc. Removable storagedrive 1014 reads from and/or writes to a removable storage unit 1018 ina well-known manner. Removable storage unit 1018 represents a floppydisk, magnetic tape, optical disk, etc., which is read by and written toby removable storage drive 1014. As will be appreciated, removablestorage unit 1018 includes a computer usable storage medium havingstored therein computer software and/or data.

In alternative implementations, secondary memory 1010 may include othersimilar means for allowing computer programs or other instructions to beloaded into computer system 1000. Such means may include, for example, aremovable storage unit 1022 and an interface 1020. Examples of suchmeans may include a program cartridge and cartridge interface (such asthat previously found in video game devices), a removable memory chip(such as an EPROM, or PROM, or flash memory) and associated socket, andother removable storage units 1022 and interfaces 1020 which allowsoftware and data to be transferred from removable storage unit 1022 tocomputer system 1000. Alternatively, the program may be executed and/orthe data accessed from the removable storage unit 1022, using theprocessor 1004 of the computer system 1000.

Computer system 1000 may also include a communication interface 1024.Communication interface 1024 allows software and data to be transferredbetween computer system 1000 and external devices. Examples ofcommunication interface 1024 may include a modem, a network interface(such as an Ethernet card), a communication port, a Personal ComputerMemory Card International Association (PCMCIA) slot and card, etc.Software and data transferred via communication interface 1024 are inthe form of signals 1028, which may be electronic, electromagnetic,optical, or other signals capable of being received by communicationinterface 1024. These signals 1028 are provided to communicationinterface 1024 via a communication path 1026. Communication path 1026carries signals 1028 and may be implemented using wire or cable, fibreoptics, a phone line, a wireless link, a cellular phone link, a radiofrequency link, or any other suitable communication channel. Forinstance, communication path 1026 may be implemented using a combinationof channels.

The terms “computer program medium” and “computer usable medium” areused generally to refer to media such as removable storage drive 1014, ahard disk installed in hard disk drive 1012, and signals 1028. Thesecomputer program products are means for providing software to computersystem 1000. However, these terms may also include signals (such aselectrical, optical or electromagnetic signals) that embody the computerprogram disclosed herein.

Computer programs (also called computer control logic) are stored inmain memory 1008 and/or secondary memory 1010. Computer programs mayalso be received via communication interface 1024. Such computerprograms, when executed, enable computer system 1000 to implementembodiments of the present invention as discussed herein. Accordingly,such computer programs represent controllers of computer system 1000.Where the embodiment is implemented using software, the software may bestored in a computer program product 1030 and loaded into computersystem 1000 using removable storage drive 1014, hard disk drive 1012, orcommunication interface 1024, to provide some examples.

Alternative embodiments may be implemented as control logic in hardware,firmware, or software or any combination thereof.

ALTERNATIVE EMBODIMENTS

It will be understood that embodiments of the present invention aredescribed herein by way of example only, and that various changes andmodifications may be made without departing from the scope of theinvention. Further alternative embodiments may be envisaged, whichnevertheless fall within the scope of the following claims.

1. A computer-implemented method of generating augmented image data, themethod comprising the computer-implemented steps of: storing, in amemory, data defining a plurality of virtual makeup products, eachassociated with colourisation parameters; receiving, via a userinterface, user input selection of at least one of said virtual makeupproducts; retrieving, from the memory, the colourisation parametersassociated with the or each selected virtual makeup product; receivingdata of an image captured by a camera; and augmenting the captured imagedata based on the retrieved colourisation parameters.
 2. The method ofclaim 1, further comprising providing a plurality of shader modules,each configured to augmenting image data based on one or morecolourisation parameters, wherein one or more of said shader modules isused to augment the captured image data.
 3. The method of claim 2,wherein each virtual makeup product is further associated with one ormore of said shader modules.
 4. The method of claim 1, furthercomprising generating at least one augmentation representation based onrespective predefined mask data identifying coordinates of a pluralityof masked pixels, wherein the captured image data is augmented furtherbased on the generated at least one augmentation representation.
 5. Themethod of claim 4, wherein generating the at least one augmentationrepresentation comprises: receiving data identifying coordinates of aplurality of labelled feature points defining a detected object in thecaptured image; generating at least one augmentation representationbased on respective predefined mask data identifying coordinates of aplurality of masked pixels, by: retrieving data defining a plurality ofpolygonal regions of augmentation image data determined for the detectedobject, wherein the augmentation image data defines a plurality ofaugmentation values to be applied to the captured image data, andwherein each polygonal region is defined by three or more vertices, eachvertex of the three or more vertices being associated with acorresponding labelled feature point; placing the retrieved plurality ofpolygonal regions over the respective mask data; identifying polygonalregions that include at least one masked pixel; and storing datarepresenting the identified subset of polygonal regions.
 6. The methodof claim 1, further comprising locating an object in the captured image,by: storing a representation of the object, the representation includingdata defining a first object model and a corresponding trained functionto fit the first object model to the captured image data, and datadefining at least one second object model comprising a subset of thedata defining the first object model, and at least one correspondingtrained function to fit the respective second object model to thecaptured image data, wherein the first object model defines a shape ofthe whole object and the at least one second object model defines ashape of a portion of the object; determining an approximate location ofthe object in the captured image, by generating a candidate global shapeof the object based on the first object model, and using thecorresponding trained function to update the candidate object globalshape based on the captured image data; and refining the location of theobject in the captured image by splitting the candidate global shapeinto one or more candidate object sub-shapes based on the at least onesecond object models, and determining a location of the one or morecandidate object sub-shapes based on the second object model and itscorresponding trained function.
 7. A system comprising one or moreprocessors configured to perform processing generate augmented imagedata by: storing, in a memory, data defining a plurality of virtualmakeup products, each associated with colourisation parameters;receiving, via a user interface, user input selection of at least one ofsaid virtual makeup products; retrieving, from the memory, thecolourisation parameters associated with the or each selected virtualmakeup product; receiving data of an image captured by a camera; andaugmenting the captured image data based on the retrieved colourisationparameters.
 8. The system of claim 7, wherein the one or more processorsare further configured to provide a plurality of shader modules, eachconfigured to augmenting image data based on one or more colourisationparameters, wherein one or more of said shader modules is used toaugment the captured image data.
 9. The system of claim 8, wherein eachvirtual makeup product is further associated with one or more of saidshader modules.
 10. The system of claim 7, wherein the one or moreprocessors are further configured to generate at least one augmentationrepresentation based on respective predefined mask data identifyingcoordinates of a plurality of masked pixels, wherein the captured imagedata is augmented further based on the generated at least oneaugmentation representation.
 11. The system of claim 10, wherein the oneor more processors are further configured to generate the at least oneaugmentation representation by: receiving data identifying coordinatesof a plurality of labelled feature points defining a detected object inthe captured image; generating at least one augmentation representationbased on respective predefined mask data identifying coordinates of aplurality of masked pixels, by: retrieving data defining a plurality ofpolygonal regions of augmentation image data determined for the detectedobject, wherein the augmentation image data defines a plurality ofaugmentation values to be applied to the captured image data, andwherein each polygonal region is defined by three or more vertices, eachvertex of the three or more vertices being associated with acorresponding labelled feature point; placing the retrieved plurality ofpolygonal regions over the respective mask data; identifying polygonalregions that include at least one masked pixel; and storing datarepresenting the identified subset of polygonal regions.
 12. The systemof claim 7, wherein the one or more processors are further configured tolocate an object in the captured image by: storing a representation ofthe object, the representation including data defining a first objectmodel and a corresponding trained function to fit the first object modelto the captured image data, and data defining at least one second objectmodel comprising a subset of the data defining the first object model,and at least one corresponding trained function to fit the respectivesecond object model to the captured image data, wherein the first objectmodel defines a shape of the whole object and the at least one secondobject model defines a shape of a portion of the object; determining anapproximate location of the object in the captured image, by generatinga candidate global shape of the object based on the first object model,and using the corresponding trained function to update the candidateobject global shape based on the captured image data; and refining thelocation of the object in the captured image by splitting the candidateglobal shape into one or more candidate object sub-shapes based on theat least one second object models, and determining a location of the oneor more candidate object sub-shapes based on the second object model andits corresponding trained function.
 13. A non-transitorycomputer-readable medium comprising computer-executable instructions,that when executed perform the method of generating augmented image databy: storing, in a memory, data defining a plurality of virtual makeupproducts, each associated with colourisation parameters; receiving, viaa user interface, user input selection of at least one of said virtualmakeup products; retrieving, from the memory, the colourisationparameters associated with the or each selected virtual makeup product;receiving data of an image captured by a camera; and augmenting thecaptured image data based on the retrieved colourisation parameters.